Semantics, Discourse and Statistical Machine Translation

نویسندگان

  • Deyi Xiong
  • Min Zhang
چکیده

In the past decade, statistical machine translation (SMT) has been advanced from word-based SMT to phraseand syntax-based SMT. Although this advancement produces significant improvements in BLEU scores, crucial meaning errors and lack of cross-sentence connections at discourse level still hurt the quality of SMT-generated translations. More recently, we have witnessed two active movements in SMT research: one towards combining semantics and SMT in attempt to generate not only grammatical but also meaningpreserved translations, and the other towards exploring discourse knowledge for document-level machine translation in order to capture intersentence dependencies. The emergence of semantic SMT are due to the combination of two factors: the necessity of semantic modeling in SMT and the renewed interest of designing models tailored to relevant NLP/SMT applications in the semantics community. The former is represented by recent numerous studies on exploring word sense disambiguation, semantic role labeling, bilingual semantic representations as well as semantic evaluation for SMT. The latter is reflected in CoNLL shared tasks, SemEval and SenEval exercises in recent years. The need of capturing cross-sentence dependencies for document-level SMT triggers the resurgent interest of modeling translation from the perspective of discourse. Discourse phenomena, such as coherent relations, discourse topics, lexical cohesion that are beyond the scope of conventional sentence-level n-grams, have been recently considered and explored in the context of SMT. This tutorial aims at providing a timely and combined introduction of such recent work along these two trends as discourse is inherently connected with semantics. The tutorial has three parts. The first part critically reviews the phraseand syntax-based SMT. The second part is devoted to the lines of research oriented to semantic SMT, including a brief introduction of semantics, lexical and shallow semantics tailored to SMT, semantic representations in SMT, semantically motivated evaluation as well as advanced topics on deep semantic learning for SMT. The third part is dedicated to recent work on SMT with discourse, including a brief review on discourse studies from linguistics and computational viewpoints, discourse research from monolingual to multilingual, discourse-based SMT and a few advanced topics. The tutorial is targeted for researchers in the SMT, semantics and discourse communities. In particular, the expected audience comes from two groups: 1) Researchers and students in the SMT community who want to design cutting-edge models and algorithms for semantic SMT with various semantic knowledge and representations, and who would like to advance SMT from sentence-bysentence translation to document-level translation with discourse information; 2) Researchers and students from the semantics and discourse community who are interested in developing models and methods and adapting them to SMT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a discourse relation-aware approach for Chinese-English machine translation

Translation of discourse relations is one of the recent efforts of incorporating discourse information to statistical machine translation (SMT). While existing works focus on disambiguation of ambiguous discourse connectives, or transformation of discourse trees, only explicit discourse relations are tackled. A greater challenge exists in machine translation of Chinese, since implicit discourse...

متن کامل

Discourse-level features for statistical machine translation

The talk will show how the disambiguation of discourse connectives can improve their automatic translation. Connectives are a class of frequent functional lexical items that play an important role in text readability and coherence. Longer-range context is taken into account to learn the signaled rhetorical relations. The labels obtained from a discourse connective classifier are then integrated...

متن کامل

Word Sense Induction for Machine Translation

We have witnessed the research progress of machine translation from phrase/syntax-based to semanticsbased and from single sentence-based to discourse and document-based. This talk presents our work of word sense-based translation model for statistical machine translation, which is one of semantics-based SMT research at word sense level. The sense in which a word is used determines the translati...

متن کامل

Disambiguating Temporal–Contrastive Discourse Connectives for Machine Translation

Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types of relations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in...

متن کامل

A Corpus-Based Approach to Text Partition

A text partition model is proposed to determine the boundaries of discourse structures. It is based on association of noun-noun relations and noun-verb relations defined on discourse level and sentence level. Three factors are considered: 1) repetition of words, 2) importance of words, and 3) collocational semantics. Ten texts serve as experimental objects. The applications of the results to se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014